{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <p style=\"text-align:center\">\n",
    "        <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
    "        <br>\n",
    "        <a href=\"https://docs.arize.com/phoenix/\">Docs</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
    "        |\n",
    "        <a href=\"https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q\">Community</a>\n",
    "    </p>\n",
    "</center>\n",
    "<h1 align=\"center\">Tracing and Evaluating a DSPy Application</h1>\n",
    "\n",
    "DSPy is a framework for automatically prompting and fine-tuning language models. It provides:\n",
    "\n",
    "- Composable and declarative APIs that allow developers to describe the architecture of their LLM application in the form of a \"module\" (inspired by PyTorch's `nn.Module`),\n",
    "- Compilers known as \"teleprompters\" that optimize a user-defined module for a particular task. The term \"teleprompter\" is meant to evoke \"prompting at a distance,\" and could involve selecting few-shot examples, generating prompts, or fine-tuning language models.\n",
    " \n",
    "Phoenix makes your DSPy applications *observable* by visualizing the underlying structure of each call to your compiled DSPy module and surfacing problematic spans of execution based on latency, token count, or other evaluation metrics.\n",
    "\n",
    "In this tutorial, you will:\n",
    "- Build and compile a  DSPy module that uses retrieval-augmented generation to answer questions over the [HotpotQA dataset](https://hotpotqa.github.io/wiki-readme.html),\n",
    "- Instrument your application using [OpenInference](https://github.com/Arize-ai/openinference), and open standard for recording your LLM telemetry data,\n",
    "- Inspect the traces and spans of your application to understand the inner works of a DSPy forward pass.\n",
    "\n",
    "ℹ️ This notebook requires an OpenAI API key.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Install Dependencies and Import Libraries\n",
    "\n",
    "Install Phoenix, DSPy, and other dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install \"regex~=2023.10.3\" dspy-ai  # DSPy requires an old version of regex that conflicts with the installed version on Colab\n",
    "!pip install arize-phoenix openinference-instrumentation-dspy opentelemetry-exporter-otlp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "⚠️ DSPy conflicts with the default version of the `regex` module that comes pre-installed on Google Colab. If you are running this notebook in Google Colab, you will likely need to restart the kernel after running the installation step above and before proceeding to the rest of the notebook, otherwise, your instrumentation will fail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "import dspy\n",
    "import openai\n",
    "import phoenix as px"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Configure Your OpenAI API Key\n",
    "\n",
    "Set your OpenAI API key if it is not already set as an environment variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
    "    openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
    "openai.api_key = openai_api_key\n",
    "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Configure Module Components\n",
    "\n",
    "A module consists of components such as a language model (in this case, OpenAI's GPT 3.5 turbo), akin to the layers of a PyTorch module and a retriever (in this case, ColBERTv2)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "turbo = dspy.OpenAI(model=\"gpt-3.5-turbo\")\n",
    "colbertv2_wiki17_abstracts = dspy.ColBERTv2(\n",
    "    url=\"http://20.102.90.50:2017/wiki17_abstracts\"  # endpoint for a hosted ColBERTv2 service\n",
    ")\n",
    "\n",
    "dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Load Data\n",
    "\n",
    "Load a subset of the HotpotQA dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dspy.datasets import HotPotQA\n",
    "\n",
    "# Load the dataset.\n",
    "dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=10)\n",
    "\n",
    "# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.\n",
    "trainset = [x.with_inputs(\"question\") for x in dataset.train]\n",
    "devset = [x.with_inputs(\"question\") for x in dataset.dev]\n",
    "\n",
    "print(f\"Train set size: {len(trainset)}\")\n",
    "print(f\"Dev set size: {len(devset)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each example in our training set has a question and a human-annotated answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_example = trainset[0]\n",
    "train_example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examples in the dev set have a third field containing titles of relevant Wikipedia articles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dev_example = devset[18]\n",
    "dev_example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Define Your RAG Module\n",
    "\n",
    "Define a signature that takes in two inputs, `context` and `question`, and outputs an `answer`. The signature provides:\n",
    "\n",
    "- A description of the sub-task the language model is supposed to solve.\n",
    "- A description of the input fields to the language model.\n",
    "- A description of the output fields the language model must produce."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GenerateAnswer(dspy.Signature):\n",
    "    \"\"\"Answer questions with short factoid answers.\"\"\"\n",
    "\n",
    "    context = dspy.InputField(desc=\"may contain relevant facts\")\n",
    "    question = dspy.InputField()\n",
    "    answer = dspy.OutputField(desc=\"often between 1 and 5 words\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define your module by subclassing `dspy.Module` and overriding the `forward` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class RAG(dspy.Module):\n",
    "    def __init__(self, num_passages=3):\n",
    "        super().__init__()\n",
    "        self.retrieve = dspy.Retrieve(k=num_passages)\n",
    "        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)\n",
    "\n",
    "    def forward(self, question):\n",
    "        context = self.retrieve(question).passages\n",
    "        prediction = self.generate_answer(context=context, question=question)\n",
    "        return dspy.Prediction(context=context, answer=prediction.answer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This module uses retrieval-augmented generation (using the previously configured ColBERTv2 retriever) in tandem with chain of thought in order to generate the final answer to the user."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Compile Your RAG Module"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, we'll use the default `BootstrapFewShot` teleprompter that selects good demonstrations from the the training dataset for inclusion in the final prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dspy.teleprompt import BootstrapFewShot\n",
    "\n",
    "\n",
    "# Validation logic: check that the predicted answer is correct.\n",
    "# Also check that the retrieved context does actually contain that answer.\n",
    "def validate_context_and_answer(example, pred, trace=None):\n",
    "    answer_EM = dspy.evaluate.answer_exact_match(example, pred)\n",
    "    answer_PM = dspy.evaluate.answer_passage_match(example, pred)\n",
    "    return answer_EM and answer_PM\n",
    "\n",
    "\n",
    "input_module = RAG()\n",
    "teleprompter = BootstrapFewShot(metric=validate_context_and_answer)\n",
    "compiled_module = teleprompter.compile(input_module, trainset=trainset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Instrument DSPy and Launch Phoenix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've compiled our RAG program, let's see what's going on under the hood.\n",
    "\n",
    "Launch Phoenix, which will run in the background and collect spans and traces from your instrumented DSPy application."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "phoenix_session = px.launch_app()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then instrument your application with [OpenInference](https://github.com/Arize-ai/openinference/tree/main/spec), an open standard build atop [OpenTelemetry](https://opentelemetry.io/) that captures and stores LLM application executions. OpenInference provides telemetry data to help you understand the invocation of your LLMs and the surrounding application context, including retrieval from vector stores, the usage of external tools or APIs, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from openinference.instrumentation.dspy import DSPyInstrumentor\n",
    "from opentelemetry import trace as trace_api\n",
    "from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter\n",
    "from opentelemetry.sdk import trace as trace_sdk\n",
    "from opentelemetry.sdk.resources import Resource\n",
    "from opentelemetry.sdk.trace.export import SimpleSpanProcessor\n",
    "\n",
    "endpoint = \"http://127.0.0.1:6006/v1/traces\"\n",
    "resource = Resource(attributes={})\n",
    "tracer_provider = trace_sdk.TracerProvider(resource=resource)\n",
    "span_otlp_exporter = OTLPSpanExporter(endpoint=endpoint)\n",
    "tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter=span_otlp_exporter))\n",
    "\n",
    "trace_api.set_tracer_provider(tracer_provider=tracer_provider)\n",
    "DSPyInstrumentor().instrument()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Run Your Application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's run our DSPy application on the dev set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for example in devset:\n",
    "    question = example[\"question\"]\n",
    "    prediction = compiled_module(question)\n",
    "    print(\"Question\")\n",
    "    print(\"========\")\n",
    "    print(question)\n",
    "    print()\n",
    "    print(\"Predicted Answer\")\n",
    "    print(\"================\")\n",
    "    print(prediction.answer)\n",
    "    print()\n",
    "    print(\"Retrieved Contexts (truncated)\")\n",
    "    print(f\"{[c[:200] + '...' for c in prediction.context]}\")\n",
    "    print()\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check the Phoenix UI to inspect the architecture of your DSPy module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(phoenix_session.url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A few things to note:\n",
    "\n",
    "- The spans in each trace correspond to the steps in the `forward` method of our custom subclass of `dspy.Module`,\n",
    "- The call to `ColBERTv2` appears as a retriever span with retrieved documents and scores displayed for each forward pass,\n",
    "- The LLM span includes the fully-formatted prompt containing few-shot examples computed by DSPy during compilation.\n",
    "\n",
    "![a tour of your traces and spans in DSPy, highlighting retriever and LLM spans in particular](https://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/dspy-tracing-tutorial/dspy_spans_and_traces.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congrats! You've used DSPy to bootstrap a multishot prompt with hard negative passages and chain of thought, and you've used Phoenix to observe the inner workings of DSPy and understand the internals of the forward pass."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
